Summary of out Data

Here is a quick summary of our data. We see how many times values occured in our data frame as well as what were the quartiles values for our data.

summary(sales_final)
##        sku        customer_name                ship_city    ship_state
##  FNG00015:10   CUSTOMER 1:58    TLAJOMULCO DE ZUNIGA:12   MX     :26  
##  FNG00014: 8   CUSTOMER 3: 9    CUAUTITLAN IZCALLI  :10   TB     :18  
##  FNG00030: 7   CUSTOMER 6: 9    VILLAHERMOSA        :10   JA     :15  
##  FNG00031: 7   CUSTOMER 2: 2    CHALCO              : 8   NL     :14  
##  FNG00033: 7   CUSTOMER 4: 2    CULIACAN            : 8   SI     : 8  
##  FNG00043: 7   CUSTOMER 5: 2    MONTERREY           : 8   CH     : 1  
##  (Other) :37   (Other)   : 1    (Other)             :27   (Other): 1  
##     zip_code  shipping_method package_24_shipment   cases_sold    
##  86280  :12   Delivery:83     Min.   :   175      Min.   :   252  
##  45679  : 8                   1st Qu.:  8898      1st Qu.: 12264  
##  56640  : 8                   Median : 20664      Median : 24840  
##  66550  : 8                   Mean   : 58039      Mean   : 47306  
##  80130  : 8                   3rd Qu.: 73386      3rd Qu.: 62118  
##  54769  : 6                   Max.   :326400      Max.   :213612  
##  (Other):33                                                       
##    avg_spend     
##  Min.   : 1.566  
##  1st Qu.: 7.142  
##  Median :12.391  
##  Mean   :10.950  
##  3rd Qu.:15.707  
##  Max.   :16.975  
## 

Visulization of Customers

Now lets take a much closer look and see in which cities customers are buying these water cases. ***

ggplot(sales_final, aes(x = customer_name, fill = ship_city))+
  geom_bar(color = 'black')+
  labs(x = 'Customer',  y= 'Where that customer is buying from', fill = 'City')

Visualization of Cases Sold

Here we show which city had the most sold cases. What was the breakdown of the shipments to each city. We can see which cities are the ones who are buying the most cases and which ones are buying the average amount of around 47306 cases. We can make cost efficient decisions in the next quarter so to maximize our orders with those cities that are buying the most cases.

#This is the total amount of cases that were shipped to each city
print(cities)
## # A tibble: 16 x 2
##    ship_city            total_sold
##    <fct>                     <int>
##  1 CHALCO                   410328
##  2 CHIHUAHUA                 12096
##  3 CIENEGA DE FLORES          4032
##  4 CUAUTITLAN IZCALLI       930676
##  5 CULIACAN                 541140
##  6 EL SALTO                  12628
##  7 ESCOBEDO                 197821
##  8 GOMEZ PALACIO              4032
##  9 GUADALUPE                 60192
## 10 MONTERREY                670048
## 11 RAMOS ARIZPE                616
## 12 TEOLOYUCAN                76552
## 13 TLAJOMULCO DE ZUNIGA     501454
## 14 TULTITLAN                 37392
## 15 VILLAHERMOSA             177761
## 16 VILLAHERMOSA TABASCO     289657
#Lets show the weight of total cases sold to each city
pie
ggplot(sales_final, aes(x = cases_sold, fill = ship_city))+
  geom_histogram(color = 'black')+
  labs(x = 'Cases Sold', fill = "City")
## `stat_bin()` using `bins = 30`. Pick better value with `binwidth`.

Showing our Data on a Map

Here we see that I divided up the information and plotted it on a map of Mexico the busiest place that the cases were shipped for Q1 of 2017. This is good to visualize where we can focus more of the shipments in the future as to be sure to maximize shipments and profits.

ggplot(mexico, aes(x = long, y = lat)) +
  geom_polygon(aes(group = group), fill = "white",
               color = "gray40", size = .2) +
  geom_point(data = sales_final2, aes(x = long, y = lat, size = cases_sold, color = ship_city))+
  scale_size_continuous(range = c(8,15))+
  labs(color = 'City of Destination', size = "Average cost of shipment", title ='Comparing the volume of shipment to each city in Mexico')

Lets look at the Distribution of Shipment Cost to each City

Here we have a boxplot that shows us the median price of shipping across all the cities in Q1. We see what the maximum value and minimum value were for shipping to that respective city. The average cost of shipment was $10.95.

ggplot(sales_final, aes(x = reorder(ship_city, avg_spend, median), y = avg_spend))+
  geom_boxplot()+
  theme_bw()+
  theme(axis.text.x=element_text(angle=45,hjust=1))+
  labs(x = "City", y = "Shipment Cost per Case(in Dollars)")

Regression between Cost per Case to Chip, The Amount of Cases Shipped, and 24 Pack Cases

Here we are showing the average shipping price of shipping cases of water against how many cases where sold at that price point. We then want to see how many complete 24 packs of water of those cases sold compare against to the price of the shipment. Since different products have different bottle sizes and pack sizes, so “24-pack equivalent” cases are used as a scale for a comparison between different product types.

ggplot(sales_final, aes(x = avg_spend, y = cases_sold))+
  geom_point(aes(size = package_24_shipment), alpha = 1/3)+
  labs(x = 'freight cost', y = 'cases sold', size = '24-pack equivalents')

Now we want to smooth out this data to see what sort of relationships each state state has with regards to the cases sold, shipping cost, and 24 pack equivalents. We can determine what relationship our data has form this information. As cases start to get larger we see which states comsume the most of our product, while others start to fall off or comsume less. We can also see at what amount states start to increase their consumptions. This informations is valuable as it would lead to know how many cases to estimate to each state which can result in having a more efficient second quarter.

ggplot(sales_final, aes(x = avg_spend, y = cases_sold))+
  geom_point(aes(size = package_24_shipment), alpha = 1/3)+
  geom_smooth(aes(color = ship_state), se = FALSE) +
  labs(x = 'freight cost', y = 'cases sold', size = '24-pack equivalents')
## `geom_smooth()` using method = 'loess'

Conclusion

With this data we can now make accurate estimations for quarter 2 of sales and freight. We can estimate how much it would cost to ship to these same locations in quarter 2 as well as know about how much each respective state and city will consume in the future. The data also tells us what the perfect amount of cases would be for each state as to now have them start to consume less product; therefore, making our shipment process much more efficient as we will know how much a specific state/city will expect to consume.